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ABSTRACT 

Matrix factorization has found incredible success and widespread 
application as a collaborative filtering based approach to recom¬ 
mendations. Unfortunately, incorporating additional sources of 
evidence, especially ones that are incomplete and noisy, is quite 
difficult to achieve in such models, however, is often crucial for 
obtaining further gains in accuracy. For example, additional infor¬ 
mation about businesses from reviews, categories, and attributes 
should be leveraged for predicting user preferences, even though 
this information is often inaccurate and partially-observed. Instead 
of creating customized methods that are specific to each type of 
evidences, in this paper we present a generic approach to factor¬ 
ization of relational data that collectively models all the relations 
in the database. By learning a set of embeddings that are shared 
across all the relations, the model is able to incorporate observed 
information from all the relations, while also predicting all the re¬ 
lations of interest. Our evaluation on multiple Amazon and Yelp 
datasets demonstrates effective utilization of additional information 
for held-out preference prediction, but further, we present accurate 
models even for the cold-starting businesses and products for which 
we do not observe any ratings or reviews. We also illustrate the 
capability of the model in imputing missing information and jointly 
visualizing words, categories, and attribute factors. 

1. INTRODUCTION 

Predicting user preferences, for items such as for commercial 
products, movies, and businesses, is an important and well-studied 
problem in recommendation systems. Collaborative filtering us¬ 
ing matrix factorization Qol, in particular, has found widespread 
adoption as the tool of choice for this problem. By relying on co¬ 
occurrences in the ratings, however, these methods do not perform 
well on users or items that do not have ample observed ratings, i.e. 
users and items that are rare or new to the system. 

Fortunately, since users and items are part of a larger database, 
extra relational information about such users and items can often 
be utilized for predicting preferences. It is, for example, often not 
difficult to obtain information such as product categories, album 
genres, review text, and attributes/features of the items, however 
this external evidence is rarely complete or noise-free. A number of 
existing approaches have thus been proposed to use these sources of 
information for improving user preferences. Koren (8), for example, 
combines the factorization model with an encoding of the external 
information as fully-observed features. Several studies have also 
investigated algorithms for incorporating specific sources of infor¬ 
mation, for example modeling user reviews imiTiiii, integrating 
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context information (6l|5l, exploiting item taxonomy dill and 
learning changes in user preferences and expertise over time din 
to improve rating prediction. However, these approaches face a num¬ 
ber of disadvantages when applied to heterogeneous, incomplete, 
multi-relational schema common in practice. First, these approaches 
are designed for certain types of relations and are restricted to re¬ 
lations of that type. Thus, it is not clear how additional sources 
of information can be incorporated, for example, how partially ob¬ 
served product categories can be used to improve rating prediction 
in McAuley et al. ini. Further, by training the model to predict 
entries of only one or two relations, these approaches ignore the 
dependencies between other relations and entities in the database, 
such as simultaneously predicting the cuisine of a restaurant, and 
the users that will like it, from the user reviews of the restaurant. 
There’s a need for a generic machine learning approach that is able 
to leverage the dependencies between users, items, and additional 
data for estimating user preferences more accurately. 

In this paper, we present a collective factorization model for incor¬ 
porating heterogeneous relational data for user preference prediction 
in a domain-independent manner. Collective factorization assigns 
a latent low-dimensional vector (an embedding or factor) for every 
entity in the database that is used to predict all of the observed rela¬ 
tions between pairs of entities. The collective model thus extends 
the intuition behind matrix factorization based recommendation sys¬ 
tems that include embeddings for every user and business/product, 
and is a generalization of Qa that assign factors to every user, busi¬ 
ness/product, and review words. Since the latent embeddings in 
collective factorization are used to model all of the observed en¬ 
tries in the database, it is capable of predicting any type of relation 
between entities. Training the embeddings to capture all of the 
dependencies also makes it easy to integrate multiple evidences for 
the same relation; incorporating another source of information is 
as simple as including an additional relation/table in the database. 
Further, since the embeddings for all the entities are defined over the 
same low-dimensional space, we can compute similarity between 
any pair of entities, even if they are not directly observed in the 
same relation. The collective factorization model provides further 
benefits for practical deployment: the training algorithm is efficient 
and scalable, and the model complexity can be controlled by varying 
the embedding dimensionality. 

We present a four-way evaluation of the collective factorization 
model (^, as applied to the YeljQand Amazon dataset0(^. 
( 1 ) We demonstrate that the collective factorization model is effec¬ 
tive in incorporating additional sources of information in in 
particular provides significant accuracy gains for predicting user 
preferences. ( 2 ) In | 5 . 2 [ we show that the proposed model is espe- 
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cially useful for cold-start estimation, e.g. for estimating preferences 
for new businesses and products for which no reviews or ratings 
have been observed. (3) An advantage of the model is that it can 
be used to impute missing values in the external data; we present 
an evaluation of this capability in §5.3| (4) We explore the implicit 
relations learned by the model that were not observed in the data by 
visualizing categories, business attributes, and the review words in 
the same two-dimensional plot in §5.4| 

2. COLLECTIVE FACTORIZATION 

In this section, we present the probabilistic collective matrix 
factorization that jointly models the relations between entities, by 
leveraging data from all the other relations the entities participate in. 

2.1 Relational Data 

We represent relational data as a set of entities (£} and relations 
between them (TZ). Formally, the observed database, denoted by D, 
consists of tuples of the form {rt, eti, et 2 , yt}T=i, where rt € 7?. is 
a relation, en ,et 2 & £ are a pair of entities, and yt G { 0 , 1 } denotes 
whether rt{eti, et 2 ) holds (or not). For example, a simple database 
that consists only of the user preferences would contain products 
and users as the entities, and only a single relation r, such that 
r{eti, 612 ) = 1 if user eti liked the product et 2 - As is clear from 
this example, many databases in real-life are only sparsely observed, 
in that only a very small set of possible relations are observed, and 
the goal of modeling such datasets is to be able to complete this 
database. Specifically, given any query rq{eqi, 692 ) that is absent 
from the observed database, we would like to predict whether the 
relation holds. Further, as we will see later, user’s preferences for 
items can be represented by one of the relations, with additional 
information about users and items as other relations. 

2.2 Collective Factorization Model 

Collective matrix factorization model ED extends the commonly- 
used matrix factorization model to multiple matrices by assigning 
each entity a low-dimensional latent vector that is shared across all 
the relations the entity appears in. Formally, we assign each entity 
e in our database a fc-dimensional embedding vector (latent factor) 
<^e € (the set of these embeddings for all the entities in the 
database is $). We model the probability that r(ei, 62 ) holds by: 

P* [r(ei, 62 ) = 1 ] = cr(<^el ■ ^el) ( 1 ) 

where cr is the sigmoid function, a{s) = The probability 

that r(ei, 62 ) = y is, 

P* [r(ei, 62 ) =y\= ■ (ftelYil - 0-{cj}el ■ ( 2 ) 

The collective factorization model presents a number of advan¬ 
tages. By sharing the entity embeddings amongst all the relations, it 
is able to capture all the sources of evidence in a joint manner, for 
example the embeddings used to predict preferences for a user will 
leverage information from other users’ preferences in a collaborative 
filtering fashion, but also from business attributes, categories, and 
words that appear in the reviews (the details of the model as applied 
to the datasets are described in The sharing of embeddings also 
allows them to be used to predict any of the relations in the database, 
i.e. along with predicting ratings, we can also predict business cate¬ 
gories, attributes, and the text of the reviews. A further advantage of 
learning collective embeddings is that all the entities are effectively 
embedded in the same fc-dimensional space, and thus similarities 

^As is common in recommendation systems, we can also include 
per entity bias and a global matrix offset to Eq.[^ We skip showing 
the biases in our formulation for brevity. 


and distances can be computed and analyzed for any set of entities 
(we explore such visualizations in § |5.4) . Finally, test-time inference 
takes constant time and thus is incredibly efficient: we only need 
a dot-product between low-dimensional vectors for estimating the 
probability of a relation to hold between a pair of entities. 

2.3 Estimating Entity Factors 

To estimate the parameters i.e. latent vectors $, we maximize 
the regularized log likelihood of the observed training instances 
(observed entries in the database, T>). Specifically, we maximize: 

4 = argmax 1{V, 4?) (3) 

<i> 

T 

1{V, $) = ^logP* [rt{eti,et 2 ) = yt] - \ (||-I>||^) (4) 

t=i 

To optimize this objective and estimate the latent factors, we use 
stochastic gradient descent (SGD) by cycling over the entries of the 
database multiple times, updating the latent factors in the direction 
of stochastic gradient for each entry. In particular, the i* update that 
uses t* database entry is given by, 

^ +y(et* ) (5) 

^ + 7 (et * (6) 

where et = yt — cr 7 is the learning rate. Simi¬ 

larly, the update rules the for per entity bias and global matrix offset, 
if used, can be derived easily. Along with strong theoretical proper¬ 
ties and widespread empirical success, the algorithm is also memory 
and time efficient since it runs on a single entry at a time, and addi¬ 
tionally, further potential for scalability via parallelism CSIII] has 
been demonstrated in recent approaches. 

3. COLLECTIVE FACTORIZATION FOR 
PREFERENCE PREDICTION 

The primary goal of recommendation systems is to provide per¬ 
sonalized recommendation of services and products to users, often 
by learning their preferences from past rating history. However, 
additional information can be leveraged to further improve the pre¬ 
dictions, especially for users and items that are infrequent or new 
to the system. For example, ratings are often accompanied by re¬ 
views whose text is undoubtedly informative about a user’s tastes. 
The prevalence of such reviews has lead to tailored approaches of 
combining text reviews with rating history to improve the rating 
predicting task. Ganu et al. (i4|, for instance, alleviate the problem 
of coarse star ratings not being expressive by predicting topic based 
ratings solely from the text reviews. McAuley et al. 021 integrate 
text reviews with star ratings by aligning the item latent factors with 
the review topic vectors obtained from topic models in order to learn 
better embeddings for items, based on the topics users write about in 
their reviews. Although such approaches are able to model text accu¬ 
rately, it is unclear how they generalize to other forms of additional 
information such as structured data about products/businesses and 
users in the form of business attributes, product categories, users’ 
social networks, and so on. This rich and diverse data can clearly 
aid in user modeling, however existing approaches that are domain- 
specific or assume data to be fully-observed, do not generalize to 
this richly heterogeneous and noisy data. In the following section we 
describe how the collective matrix factorization described in ^ can 
be used to collectively model all kinds of relations in such databases 
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Figure 1 : Collective Factorization for the Yelp Dataset: Overview of the entities and the relations, with the latter represented by sparsely- 
observed matrices. The collective factorization model contains low-dimensional dense embeddings for all the entities which are used to model 
the respective relations the entities appear in (denoted by arrows). Collective factorization for Amazon is similar, the difference being that 
instead of businesses, Amazon contains products and does not contain attribute data. 


to more accurately predict user preferences, and at the same time, 
help in completing the missing information in the database. 

3.1 Capabilities of Collective Factorization 

Collective Factorization, described in 0 is a general and ef¬ 
fective method of jointly modeling all binary-valued relations in a 
database since it does not require custom modeling for each relation 
and, by learning shared universal embeddings for all entities, is able 
to exploit inherent dependencies between the entities for better pre¬ 
diction. Being a probabilistic model that implicitly assumes noisy 
observations enables the model to deal with noise in the observed 
relations effectively. This further enables the model to impute the 
relations as well, facilitating robustness to missing information. We 
use collective factorization to improve the task of binary user prefer¬ 
ence prediction by leveraging the additional information about items 
and users. Specifically, collectively modeling additional relations 
enables our model to learn better user and item embeddings from its 
attributes, category, and text reviews, facilitating more accurate user 
preference prediction in conjunction with past rating history. 

Learning universal embeddings for entities further enables us 
to predict user preferences for new items that do not contain any 
rating history. The embeddings for such items are learned from 
their category and attribute data, while user preferences for them are 
learned from past user ratings of similar items. Joint modeling of 
relations also enables us to impute missing entries for the item and 
user relations, and further, learning shared embeddings in the same 
latent space enables us to estimate similarity between entities that 
do not explicitly participate in a relation, such as similarity between 
review words and product categories. 

We show the efficacy of our model in the tasks described above 
by evaluating our model on databases from two domains. Yelp and 
Amazon. While both databases primarily contain user ratings and 
reviews, are very different from each other in the type of reviews 
and services they provide. Yelp contains user reviews about local 
businesses ranging from Restaurants to Cardiologists while Amazon 
is a retailer that sells products and contains user reviews about the 
products it sells. Apart from user ratings and reviews, both the 
databases contain additional information about businesses, products 
and users, but contain different number of entities and relations. 


Details about the two databases and how we convert the available 
information into binary-valued relations for collective factorization 
is described in the sections below. 

3.2 Yelp Database 

Yelp contains rich relational data about businesses and users in 
the form of business attributes, categories, and user reviews and 
ratings. Hence, the entities present in the Yelp database we create 
are businesses, categories, attributes, users and review words used 
by users. We denote the set of these entities hy Sb, Sc, Sa, Su 
and Sw respectively. Each entity in the database is represented by a 
fc-dimensional embedding, as shown in the top part of Figure[T] 

Each business in Yelp is categorized into a set of nearly 700 cate¬ 
gory types according to the nature of the business. The categories 
available in Yelp include broad-level classes such as Doctors, Educa¬ 
tion, and Restaurant, but also fine-grained descriptions such Italian, 
Hookah Bars, and Orthodontists. The business category data can be 
viewed as a binary relation between businesses {Sb) and categories 
(Sc), and is represented as matrix C . 

Apart from categorization. Yelp also describes various attributes 
for each business. Such attributes include type of parking, delivers 
(or not), noise level and so on. We represent this relation between 
businesses {Sb) and attributes {Sa) as a binary matrix denoted by 
A. We transform attributes that are multi-valued into multiple binary 
valued attributes, for example the attribute “Smoking” in the dataset 
has “Yes”, “No” and “Outdoor” as possible values. To represent 
this with binary values, we unwrap it into three separate attributes, 
namely “Smoking(Yes)”, “Smoking(No)” and “Smoking(Outdoor)”, 
each of which is expressed as a binary value. 

A complex relationship between users and businesses exists in 
the form of ratings and text reviews. We represent this user-business 
relation in various forms. The ratings given by users on a 5 -scale are 
converted to a binary-valued preference relation between businesses 
{Sb) and users {Su) with high ratings (4 and 5 ) as true(l) and 
low ratings (3 and below) as false(O). We denote the binary matrix 
representing this user preference relation by R. The relation R thus 
represents the likes and dislikes of users towards businesses. The 
relationship between businesses {Sb) and words {Sw) that are used 
in its reviews is represented by the relational matrix BW in which 





















|Sa| 

|Sc| 

|Sb| 

|Sw| 

|Su| 

Phoenix 

92 

472 

22180 

25 277 

102 576 

Las Vegas 

92 

416 

14583 

28 551 

147 774 

Madison 

77 

176 

2118 

6811 

9 737 

Edinburgh 

74 

160 

2 840 

6 830 

2484 



|A| 

|C| 

|R| 

|BW| 

|UW| 

Phoenix 

354068 

10468 960 

475 116 

8 533 231 

12339706 

Las Vegas 

235 735 

6066528 

556326 

7 246237 

16598 396 

Madison 

41 105 

372768 

35 661 

706026 

987 735 

Edinburgh 

41218 

454400 

20 306 

730871 

435 801 


Table 1: Number of entities and observed tuples for Yelp 



Arts 

Electronics 

Average 


(Smallest) 

(Largest) 

Sp| 

4211 

82 067 

33 494 

Sc| 

238 

886 

525 

Su| 

24059 

811034 

194 078 

Sw| 

3916 

32 086 

12169 

R1 

27 751 

1 196 547 

312 851 

C| 

20 384 

381796 

145 052 

PW| 

382 397 

17 150984 

4424 924 

UW| 

667 832 

36 621689 

7 415 377 


Table 2: Number of entities and observed tuples for Amazon 


a true{ 1) value for a (business, word) tuple denotes the usage of the 
word for the business in at least one review. Similarly, the relation 
between users (Su) and the words (Sw) used by them in reviews 
is represented as a binary matrix denoted by UW. 

Figure [T] gives an overview of the various relations and entities 
present in the Yelp database we create. It shows how different 
entities participate in multiple relations, which leads to their em¬ 
beddings being shared among different relations. For example, the 
embeddings for businesses (Sb) participate in modeling relations 
C, A, R and BW. 

3.3 Amazon Database 

The Amazon database contains user ratings and review data for 
products sold on Amazon. Each product is categorized into multiple 
broad-level, intermediate, and fine-grained categories. Hence, the 
entities present in the Amazon database are products, categories, 
users and review words, denoted by Sp, Sc, Sc/and Sw, respec¬ 
tively. Similar to Yelp, each entity in the database is represented by 
a fc-dimensional embedding. 

We represent the relationship between products (Sp) and cate¬ 
gories (Sc) as a binary-valued relational matrix, C . User ratings on 
Amazon are given on a 5-scale and is converted to a binary-valued 
relation between products (Sp) and users (Su) in a manner similar 
to Yelp. The relationship between users and products in the form of 
text reviews is represented as two binary-valued relations, PW that 
captures the relationship between products (Sp) and review words 
(Sw) used for them, and UW that represents the relation between 
users (Su) and review words (Sw) used by them. 

Similar to Yelp, the embeddings for different entities in the Ama¬ 
zon database collectively model the relations they participate in. For 
example, the embeddings for products (Sp) participate in modeling 
relations R , C and PW in Amazon. The collective factorization for 
the Amazon database looks similar to as shown in Figure [T] except 
that Amazon, instead of businesses, has data about products and 
does not contain the attribute relation. 


4. EXPERIMENT SETUP 

In this section, we describe in detail the Yelp and Amazon datasets, 
data pre-processing and the models and experiment methodology. 
Datasets: Yelp provides data from five cities namely, Phoenix, Las 
Vegas, Madison, Edinburgh and Waterloo, but we focus on the first 
four datasets due to the small size of the Waterloo dataset. Each of 
the datasets in Yelp follows the same schema, allowing evaluation of 
our model. For each of the datasets in Yelp we create the relational 
matrices, R, C, A, BW and UW. Table [T] shows the number of 
entities participating in each relation of the various Yelp datasets, as 
well as the number of observed entries for each relation. 

Amazon provides datasets from 25 broad categories of products 
of which we use data from 22 of those datasets and omit Books, 
Music and Movies & TV datasets due to their size. Like Yelp, each of 
the datasets in Amazon follows the same schema. For the datasets in 
Amazon we create R, C, PW and UW relation matrices. Table|^ 
shows the range of the number of entities and observed tuples per 
relation over the Amazon datasets. 

Data Pre-processing: To create the BW, PW and UW matrices 
for Yelp and Amazon, we tokenize the reviews, remove the punctu¬ 
ations, numbers, and stop words, and stem the words using Porter 
El. For evaluation purposes, we only consider words that appear in 
at least 10 reviews. Since BW and UW matrices only contain ob¬ 
served words (all positives), we sample negative data entries in each 
epoch by randomly selecting a set of words that were not observed 
to be true for the business/user. The number of negative samples cho¬ 
sen for each relation is same as the number of observed entries for 
the relation. We found the categories C matrix in Yelp to be fairly 
comprehensive, and thus explicitly treat all unobserved entries to be 
negative (thus effectively C in Yelp is fully-observed and complete). 
Upon manual investigation, we find that the categories assigned to 
the products in Amazon are not always comprehensive and thus we 
do not treat C to be fully observed. Since C in Amazon contains 
only observed categories (all positives), we sample negative data 
entries in a manner similar to as we do for BW , PW and UW . 
For our experiments, we only consider categories that are associated 
with at least 5 businesses or products. 

Methods: The primary benchmarks for evaluating our models will 
be on predicting user preferences towards businesses on Yelp and 
products on Amazon, in particular to study how incorporating addi¬ 
tional information into the factorization model provides significant 
improvement in predictions. The baseline model that performs 
standard matrix factorization of R independent of other relations 
is denoted by R. We evaluate the effect of integration of different 
relations on user preference prediction by factorizing combinations 
of different matrices collectively with the R matrix. An example of 
the model that predicts user preferences for businesses on Yelp by 
incorporating business categories is denoted by R -l- C. To predict 
whether a relation holds between entities, we use the default logistic 
threshold of 0.5 on the model probability. We measure the perfor¬ 
mance of our relation prediction in terms of the FI score defined 
as the harmonic mean of the precision and recall, which is a much 
more accurate measure than accuracy for imbalanced label distri¬ 
butions. To present a combined score for all the datasets, each in 
Yelp and Amazon, we aggregate all the predictions of the datasets, 
and compute a single FI score over them in the micro-averaged 
fashion. Such micro-averaged FI score would give more weightage 
to the larger datasets as compared to the smaller ones. One way to 
weigh all datasets equally is averge the FI scores across datasets. 
The value of the regularization constant, A = 0.001, learning rate, 
7 = 0.01 and latent-factor dimensions A: = 30 for Yelp and fc = 5 
for Amazon is used, based on the performance on validation data. 















R 

R+C 

R+PW 

R+UW 

R+C 

+PW 

R+C 

+UW 

Arts 

65.1 

68.5 

68.9 

84.0 

68.1 

86.4 

Automotive 

66.8 

73.9 

73.5 

84.4 

74.0 

85.7 

Baby 

67.1 

70.7 

71.3 

86.3 

70.5 

85.8 

Beauty 

71.7 

76.0 

75.5 

87.1 

75.6 

88.7 

Cell Phones & Acc. 

55.8 

58.9 

58.8 

75.2 

58.4 

75.1 

Clothing Access. 

92.3 

93.8 

94.0 

93.6 

93.9 

96.2 

Electronics 

66.5 

71.0 

70.8 

82.4 

70.8 

84.2 

Gourmet Foods 

64.5 

72.1 

71.9 

85.2 

72.2 

88.3 

Health 

67.0 

71.8 

71.7 

84.5 

71.6 

86.4 

Home Kitchen 

68.1 

72.6 

72.7 

83.9 

72.9 

86.4 

Industrial Scientific 

91.1 

93.4 

93.7 

94.5 

93.7 

96.7 

Jewelry 

68.0 

74.8 

74.3 

81.5 

74.5 

87.4 

Musical Instruments 

61.1 

71.1 

70.9 

85.3 

71.0 

88.0 

Office Products 

63.9 

67.3 

67.2 

83.1 

67.5 

85.1 

Pet Supplies 

65.1 

70.2 

69.7 

84.2 

70.1 

85.9 

Shoes 

97.6 

96.0 

95.9 

95.0 

95.8 

97.6 

Software 

54.7 

58.3 

58.4 

68.0 

58.3 

71.7 

Sports Outdoors 

71.9 

76.6 

76.9 

85.5 

76.4 

88.7 

Tools & Home Imp. 

66.5 

72.3 

72.4 

84.1 

72.7 

86.6 

Toys Games 

65.8 

70.7 

70.6 

84.3 

70.5 

86.8 

Video Games 

68.1 

71.1 

70.8 

82.6 

71.2 

83.7 

Watches 

61.6 

64.2 

63.7 

85.6 

64.1 

87.7 

Combined 

72.1 

75.0 

75.8 

85.5 

75.1 

87.3 


Table 3: Held-out User Preference prediction on Amazon: FI 

of predicting held-out user preferences from R . The models being 
evaluated vary in the number of relations modeled when learning 
the factors, with additional relations often resulting in more accurate 
models across datasets. 


5. RESULTS 

In this section, we evaluate the effect of incorporating relational 
information in predicting user preferences. First, we present the 
accuracy of predicting preferences on held-out items in ^5.1| to test 
the performance when the entities already contain a few observed 
ratings. We further investigate the performance of different models 
on cold-start estimation in §5.2[ where, for example, we predict 
user preferences for businesses with no past ratings or reviews 
available. One of the major advantages of modeling all relations 
jointly is the ability to complete other missing relational matrices 
while at the same time predicting user preferences. We investigate 
the performance of our collective factorization model in predicting 
business attributes in Yelp and product categories in Amazon in §5.3| 
by utilizing category, ratings, and reviews. Finally, utilizing the fact 



Phnx. 

L. Vegas 

Madison 

Ednbgh. 

Combined 

R 

72.2 

70.5 

68.5 

75.2 

71.3 

R+A 

73.7 

71.1 

70.5 

76.3 

72.3 

R+BW 

74.2 

70.8 

71.4 

75.8 

72.4 

R+C 

75.1 

72.2 

72.2 

76.4 

73.6 

R+UW 

80.0 

78.5 

78.5 

78.5 

79.2 

R+A+C 

74.2 

71.1 

70.7 

74.8 

72.5 

R+A+BW 

74.1 

70.7 

70.8 

76.5 

72.3 

R+C+BW 

74.3 

70.9 

71.7 

77.2 

72.6 

R+A+UW 

80.0 

78.5 

78.9 

79.1 

79.2 

R+C+UW 

80.9 

79.9 

80.8 

79.0 

80.4 

R+A+C+BW 

73.9 

70.8 

71.1 

76.4 

72.3 

R+A+C+UW 

80.8 

79.3 

77.9 

78.5 

79.9 


Table 4: Held-out User Preference prediction on Yelp: FI on the 

predicting held-out ratings from R for Yelp as the amount and type 
of additional information is varied. 
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Figure 2: Precision/Recall Curves for Held Out User Preferences 


that the model embeds all entity types in the same fc-dimensional 
space, we present visualizations in ^5.4| that explore similarities 
between entities for which explicit relations are not observed. 

5.1 User Preference Prediction 

The most important problem in recommendation systems is to be 
able to predict user preferences for products/businesses. To show 
how our model improves significantly on predicting user preferences 
by leveraging additional information for existing entities, we carry 
out evaluation of held-out predictions. To split the Yelp and Amazon 
data into training, validation and test sets for evaluation, we ran¬ 
domly choose 70% and 80% of the observed ratings from Yelp and 
Amazon respectively, for training and equally divide the remaining 
data into validation and test sets. We present the performance of 
different models by varying the user relations, business relations for 
Yelp and product relations for Amazon available during training for 
user preference prediction. 

User Preference Prediction on Amazon: We expect that incor¬ 
poration of additional information about products and users on 
Amazon, such as categories and text reviews should improve the 
prediction of user preferences. Results for collective factorization of 
combinations of various relational matrices containing information 
about products and users on Amazon, with the R matrix are shown 
in Table[^ Our baseline model achieves an FI score of 72.1% on 
predicting user preferences and gains of 4% and 5.1% are observed 
when incorporating information about the products in terms of its 
categories (C) and review words used for them [PW), respectively. 
Significant improvement of 18.5% from the baseline is obtained 
by incorporating the relationship between the users and the review 
words they use (UW). It is clear from this, that the user reviews are 
quite indicative of their likes and dislikes for various aspects of a 
product, which further helps to predict user preferences. Additional 
increase of 2.1% is obtained by additionally incorporating product 











category (C) information on top of user-words {UW) relation. Fig- 
urej^shows that, even though different models perform similarly in 
the high precision and recall regions, models that incorporate more 
information dominate the majority of the plot. 

User Preference Prediction on Yelp: Tablej^shows the accuracy 
of our collective factorization model in user preference prediction 
on Yelp by combining various relational matrices about businesses 
and users with the R matrix. Our baseline model achieves an FI 
score of 71.3%, with an increase of 1.4% when incorporating infor¬ 
mation about the businesses in terms of its attributes (A) or review 
words used for them [BW). Incorporating business categories (C) 
improves upon the baseline model by 3.22%. Similar to Amazon, 
significant improvement of 11.07% from the baseline is obtained by 
incorporating relationship between the users and their review words 
{UW). Further increase of 1.5% is obtained by incorporating busi¬ 
ness category (C) information on top of user-words {UW) relation. 
When adding information about business attributes along with busi¬ 
ness categories and user-words relations, we find that the prediction 
accuracy falls only slightly by 0.62%. A reason for this may be a 
lack of dependence between user preferences and business attributes, 
and thus modeling attributes along with preferences slightly affects 
the accuracy. The precision/recall curves in Figure ]^ show how 
incorporating different kinds of information about users and busi¬ 
nesses affect user preference prediction. It is clear that user-word 
relation provides higher gains than incorporating information about 
businesses, but more importantly, integrating information about both 
businesses and users achieves the best performance. 

As in clear from the above results, our collective factorization 
model is efficient in integrating additional information about prod¬ 
ucts, businesses and users when predicting user preferences for 
users with available rating history. In both the databases it is seen 
that incorporating the pror/wcf-wore/ {PW) or business-word {BW) 
relation helps improve user preference prediction that shows, incor¬ 
poration of words used for products and businesses helps learn better 
embeddings for both. Additionally, the increase by incorporation 
of the category information {C), suggests that the addition of cate¬ 
gories helps the model learn user biases towards certain categories 
along with their other preferences. Significant improvements in user 
preference prediction by integration of knowledge about the words 
used by users {UW) shows the importance of reviews written by 
users in learning better embeddings for users by learning learning 
in-depth likes and dislikes of users that are not reflected in just the 
ratings but in the detailed reviews written. 

Including the per-entity biases and a global offset for each rela¬ 
tional matrix gave similar accuracy on user-preference prediction 
when factorizing all relations collectively. The baseline for each 
Yelp and Amazon was higher than the current baseline and thus the 
gains from additional information were reduced. 

5.2 Cold-Start Evaluation 

One of the major challenges faced by recommendation systems 
is to predict ratings for new products, businesses and users for 
which no reviews or ratings have been observed. This problem is 
not just specific to recommendation systems, but common to all 
relation prediction frameworks. Most of the factorization models 
for relation prediction fail to incorporate information about entities 
from relations, apart from the relation to be predicted, and thus 
provide poor cold-start performance. 

Our collective factorization model benefits greatly from learning 
shared factors for entities by leveraging all sources of information 
about the entity. Hence, in the absence of observed data for a 
particular relation, embeddings learned from other relations can still 
be used to predict the relation. Specifically, we show that our model 



R* 

R+UW 

R+C 

R+C+UW 

Arts 

60.8 

64.5 

65.4 

86.9 

Automotive 

58.7 

60.0 

73.0 

87.7 

Baby 

60.8 

57.9 

70.7 

85.6 

Beauty 

60.7 

56.9 

75.9 

86.4 

Cell Phones & Ace. 

54.1 

53.0 

56.8 

71.3 

Clothing Accessories 

59.4 

62.3 

94.1 

95.0 

Electronics 

59.3 

57.7 

71.3 

83.8 

Gourmet Foods 

61.7 

56.0 

71.6 

88.7 

Health 

59.4 

61.2 

71.0 

86.3 

Home Kitchen 

58.7 

60.6 

72.4 

84.2 

Industrial Scientific 

61.4 

62.0 

91.9 

96.0 

Jewelry 

61.5 

64.9 

76.7 

77.3 

Mnsical Instruments 

59.8 

60.3 

70.5 

88.3 

Office Products 

59.8 

58.7 

67.2 

83.2 

Pet Supplies 

59.1 

59.8 

70.9 

85.1 

Shoes 

59.2 

58.7 

95.8 

97.1 

Software 

51.6 

54.2 

59.0 

72.6 

Sports Outdoors 

61.8 

62.1 

77.6 

87.6 

Tools & Home Impr. 

59.3 

61.9 

72.5 

85.8 

Toys Games 

61.3 

59.7 

70.5 

84.8 

Video Games 

59.3 

61.3 

67.9 

77.7 

Watches 

60.6 

59.6 

65.6 

85.9 

Combined 

59.6 

59.9 

75.7 

86.7 


Table 5: Cold-Start User Preference prediction on Amazon: FI 

scores of different collective factorization models in predicting the 
user preferences for products on Amazon for which no ratings or 
reviews are observed. *Conventional matrix factorization, R, is a 
trivial straw-man in that it does not have any way to differentiate 
amongst cold-start businesses. 

can learn factors for products and businesses for which no reviews 
or ratings were observed from its categories and attributes, and use 
them to predict user preferences. For evaluation on Amazon, we 
withhold all observed cells of the relation R for a random 10% of 
the products. We use 90% of the data for the remaining products for 
training and the rest for validation. We split the Yelp data similarly, 
by withholding all observed cells of the relation R for a random 
10% of the businesses. Apart from the variety of collective models, 
we also include the uninformative straw man that has the same 
prediction for all cold-start products on Amazon and businesses 
on Yelp in user preference prediction, evaluated by computing the 
FI score when the the factors for new products and businesses are 
randomly initialized to small values. 

Product Cold-Start on Amazon: Predicting user preferences for 
new products a priori is an exciting problem since it can help sellers 
on Amazon quickly identify the target audiences that would like the 
products. Users that have biases towards certain categories prefer 
products that cater to their needs. Using our collective factoriza¬ 
tion model, we can integrate category knowledge for new products, 
along with user reviews and preferences for existing products, to 
predict preferences for new ones. Table|^shows the performance 
of different models that vary in information being used to predict 
user preferences for products that do not contain any rating history 
in the database. The results corroborate the fact that learning good 
embeddings just for users (via UW) is not enough to predict user 
preferences. We find that incorporating product information in terms 
of its categories (C) obtains an accuracy as high as 75.7%, and fur¬ 
ther, integrating of user-word relation {UW) results in prediction FI 
as high as 86.7%. On the other hand. Figure [^suggests that adding 
categories by itself would obtain a higher accuracy than adding both 
category and user-word relations if the threshold is tuned. 

Business Cold-Start on Yelp: Tableshows how collective factor¬ 
ization is able to use business category {C) and business attribute 
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Figure 3: Precision/Recall Curves for Cold-Start User Preferences 



Phnx. 

L. Vegas 

Madison 

Ednbgh. 

Combined 

R* 

57.0 

56.9 

58.0 

56.3 

57.0 

R+UW 

57.8 

57.0 

56.4 

59.7 

57.4 

R+A 

72.8 

69.4 

70.6 

74.2 

71.1 

R+C 

75.1 

72.1 

72.3 

75.5 

73.5 

R+A+C 

74.9 

71.2 

71.7 

77.6 

73.0 

R+A+UW 

78.1 

74.5 

78.4 

75.1 

76.3 

R+C+UW 

79.6 

76.2 

74.2 

70.0 

77.6 

R+A+C+UW 

79.6 

77.4 

77.8 

72.6 

78.2 


Table 6: Cold-Start User Preference prediction on Yelp: FI 

scores of different collective factorization models in predicting the 
user preferences for businesses for which no ratings or reviews are 
observed. *For matrix factorization R all cold-start business rows 
are empty, and thus is a straw-man. 



C* 

C+R 

C+PW 

C+R+PW 

Arts 

49.5 

52.1 

85.4 

83.1 

Automotive 

49.9 

53.9 

87.6 

87.8 

Baby 

52.3 

54.4 

88.2 

88.0 

Beauty 

49.7 

58.5 

89.8 

89.1 

Cell Phones & Acc. 

48.6 

52.7 

90.5 

90.0 

Clothing Accessories 

49.8 

67.7 

90.1 

90.1 

Electronics 

49.3 

57.8 

90.4 

90.5 

Gourmet Foods 

49.5 

57.3 

87.4 

87.1 

Health 

49.7 

57.8 

89.5 

89.8 

Home Kitchen 

49.9 

57.3 

89.9 

89.7 

Industrial Scientific 

50.3 

66.7 

89.7 

89.6 

Jewelry 

49.2 

59.0 

83.9 

84.9 

Musical Instruments 

50.3 

53.0 

86.3 

87.0 

Office Products 

49.0 

53.3 

86.6 

86.8 

Pet Supplies 

50.4 

59.4 

90.6 

90.3 

Shoes 

50.0 

77.2 

90.3 

90.6 

Software 

49.6 

53.7 

87.8 

86.6 

Sports Outdoors 

49.6 

59.6 

89.0 

88.8 

Tools & Home Impr. 

49.7 

55.8 

87.7 

87.8 

Toys Games 

49.6 

59.8 

89.2 

89.3 

Video Games 

49.1 

61.4 

89.8 

90.3 

Watches 

51.1 

53.7 

84.6 

83.8 

Combined 

49.7 

59.4 

88.9 

88.8 


Table 7: Imputing Product Categories on Amazon: FI evalua¬ 
tion for imputing the categories for products on Amazon for which 
no category data was observed. *Matrix factorization C, cannot 
differentiate at all amongst the cold-start products. 



Phnx. 

L.Vegas 

Madison 

Ednbgh. 

Combined 

A* 

38.9 

37.6 

38.2 

34.3 

38.1 

A+R 

57.5 

53.5 

51.3 

53.4 

55.5 

A+C 

79.6 

77.4 

74.0 

76.2 

78.3 

A+BW 

81.8 

81.8 

77.5 

72.7 

81.0 

A+R+C 

78.7 

75.9 

72.9 

74.7 

77.2 

A+R+BW 

82.1 

80.6 

77.2 

74.4 

80.9 

A+C+BW 

82.1 

81.0 

77.1 

76.6 

81.1 

A+R+C+BW 

82.1 

80.6 

77.3 

75.7 

80.9 


Table 8: Imputing Business Attributes on Yelp: Performance of 
different collective factorization models in predicting attributes for 
business without any observed attributes. *Similar to other cold- 
start results, matrix factorization A here is also an uninformative 
baseline that has the same prediction for all business attributes. 


[A) relation to learn good embeddings for businesses and predict 
user preferences for businesses with no available rating history. As 
expected, learning good embeddings just for users (via UW ) is 
not enough to predict user preferences. Incorporation of informa¬ 
tion about businesses in terms of attributes (A) and categories (C) 
obtains prediction FI as high as 71.1% and 73.5% respectively. 
Collectively integrating all information about businesses and users 
simultaneously, such as categories (C), attributes (A) and user-word 
relation {UW) leads to prediction FI as high as 78.2%. Figur ^Tb] 
shows how integrating additional information about both businesses 
and users outperforms models with relatively less information. 

By obtaining results close to those in § [m we demonstrate 
that the collective factorization model is able to almost completely 
overcome the lack of existing user preferences for products and 
businesses by utilizing other relations. 

5.3 Imputing Missing Entries in Database 

Joint factorization of all relations in the database by learning 


shared universal embeddings for entities to predict user preferences, 
also enables us to predict missing entries in the incomplete rela¬ 
tional matrices simultaneously while they help in predicting user 
preferences. For example integration of business category (C) and 
business-review words {BW) relation in Yelp while aiding the 
prediction of user preferences also helps in completing missing at¬ 
tributes for businesses in A in the Yelp database. This potential of 
our collective factorization model to impute missing entries in the 
database has several advantages. For example, predicting missing 
attributes in the Yelp database helps in completing the database, 
more accurate prediction of user preferences, and also helps users 
make more informed decisions when choosing between businesses. 

In this section, we evaluate our model’s efficacy in completion of 
additional relational information of entities by predicting product 
categories (C) in Amazon and business attributes (A) in Yelp. 
Similar to ^5.2[ we present cold-start evaluations, where we show 
how additional information about products and businesses can be 
leveraged to predict categories for products in Amazon and attributes 

















for businesses in Yelp without any observed category and attribute 
information, respectively. For evaluation, we withhold all observed 
cells of the relation C and A for a random 10% of the products on 
Amazon and businesses in Yelp, respectively. We use 90% of the 
remaining data for training and the rest for validation. Apart from 
the variety of collective models, similar to ^5.2[ we also include 
the uninformative straw man that has the same prediction for all 
cold-start products on Amazon for product category prediction and 
businesses on Yelp for business attribute prediction. 

Product Category Completion on Amazon: Table[7]shows the ac¬ 
curacy of our model in predicting missing product category data for 
products with no observed category data during training by learning 
useful product embeddings from other relations. We see that incor¬ 
porating products-review words {PW) relation helps in predicting 
product categories with a FI score of 88.9%. Additionally incorpo¬ 
rating the user preference data does not improve the accuracy which 
suggests that it is the reviews that help the most in learning good 
embeddings for products when predicting categories they belong to. 
Business Attribute Completion on Yelp: Quite surprisingly, 15.91% 
of the total 42 151 businesses in the Yelp database do not contain 
any information about their attributes. In Tablej^we show the ac¬ 
curacy of our model on attribute completion for businesses with 
no attributes observed during training. As expected, incorporating 
business-category [C) and business-word (BW) relations helps the 
most in predicting attributes for new businesses. FI score as high 
as 78.3% is achieved on incorporating the C relation and further 
addition of BW relation achieves a FI score of 81.1%. Integrating 
ratings data on top, doesn’t affect the model a lot and obtains an FI 
score of 80.9%. 
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Figure 4: Visualization of Category and Review Word factors, show¬ 
ing similarity of factors for semantically-similar words and cate¬ 
gories. Best viewed in color. 


5.4 Jointly Visualizing Entity Embeddings 

Finding relationships and similarity between entities that do not 
participate in the same relation in the database schema is a challeng¬ 
ing problem in relation learning. Similarity between entities has 
important applications in data visualization and developing intuitive 
user interfaces, amongst others. Since our model defines embed¬ 
dings for all entities over the same fc-dimensional space, we can 
compute similarities between any pairs of entities even if they do not 
appear in the same relation. For example, reviews, along with indi¬ 
cating user preferences, also contain information about the business 
categories and attributes. In this section, we show how the learnt 
embeddings for review words, categories and attributes in Yelp that 
reveal similarity between them. We project the embeddings of a 
select subset of categories, attributes and a subset of similar review 
words onto a 2-dimensional plot using the t-Distributed Stochastic 
Neighbor Embedding (t-SNE) iD technique for dimensionality re¬ 
duction. This is a randomized, approximate technique that attempts 
to maintain the distances between entities in k dimensions when 
projecting them to two dimensions. The vectors used here for cat¬ 
egories, attributes and review words are obtained by collectively 
factorizing A, R, C and BW relations on the Yelp Phoenix data. 
Visualizing Categories and Review Words: The efficacy of our 
model in learning inter-category similarity and similarity between 
categories and review words in Yelp is shown in Figure]^ For ex¬ 
ample, our model is able to learn that Indian and Pakistani cuisines 
and Korean and Japanese cuisines are similar to each other. Gyms 
and Eitness & Instruction categories for businesses are similar and 
Beer, Wine & Spirits is related to Nightlife. Our model also learns 
the semantic similarity between categories and review words. For 
example, words closest to Auto Parts & Supplies are rotor, coolant, 
wiper, transmission and closest to Arts & Entertainment are audi¬ 
torium, theatre, imax, movie-going and orchestra. For categories 
related to food, our model learns the names of dishes as being clos¬ 
est to categories, suggesting that the users mostly talk about the 
dishes when reviewing restaurants. For example, the words closest 
to Mexican are carnita, flauta, chimichanga, and relleno. Coffee & 
Tea mefrappe, chai, macchiato, and frappuccinno, and Bakeries are 
scone, croissant, and quiche. Our model is also able to learn word 
embeddings in such a manner that same words used in reviews for 
dissimilar businesses are approximately between both the categories. 
For example, words like enroll, taekwondo and curriculum, which 
may belong to reviews of both education related businesses and fit¬ 
ness centers, lie in between the Education and Fitness & Instruction 
category. Similar observations are made in words that are close to 
Bakeries and Coffee & Tea categories. 

Visualizing Categories, Attributes, and Words: In Figure]^ we 
plot a subset of categories and the attributes and review words 
that are close to them. From the figure, we see that our model 
learns similar embeddings for attributes that co-exist for certain 
types of businesses. The proximity of Ambience(divey) and Good 
Forflatenight) attributes suggests that places that have a divey am¬ 
bience are good for late nights. Also, businesses that have a happy 
hour mostly also have a Jukebox is shown by the proximity of Happy 
Hour and Music (Jukebox). The proximity of the categories Fast 
Food and Coffee & Tea to the attributes Drive-Thru, Wi-fi(free) and 
Alcohol)none), category Doctors to attribute By Appointment Only, 
and the category Arts & Entertainment to attribute Music)live), all 
demonstrate that the model is able to learn how certain categories 
of businesses are most likely to have certain attributes. We also see 
from the figure that the even though the reviews do not explicitly 
talk about the attributes and categories, our model is able to capture 
the similarity between them simultaneously. For example, words 
like jukebox, karaoke, bartender, chianti lie close to attributes Good 
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Figure 5: Visualization of Categories, Attributes, & Words. Best 
viewed in color. 


For(latenight), Happy Hour and the category Nightlife. 


6. RELATED WORK 

This work builds upon a large and growing area of machine learn¬ 
ing applied to recommendation systems and modeling of structured 
datasets. We describe a subset of these approaches that are directly 
related to, and inspired, our proposed work. 

The idea of using low-dimensional vectors as latent factors has 
found widespread use in recommendation systems. The task of 
suggesting products/items to users is traditionally viewed as matrix 
completion where the sparse rating matrix with users as rows and 
items as columns is to be completed with predicted ratings. Sarwar 
et al. I20l show how Singular Value Decomposition (SVD) can be 
used to decompose the rating matrix into low rank feature matrices to 
reduce dimensions of the rating matrix. This gave rise to the widely 
used matrix factorization techniques for predicting ratings Go) in 
which the user and item factors capture the similarities amongst 
them. Conventional matrix factorization techniques predict ratings 
directly as the dot product of the factors of the user and the item, and 
use regularized least-squares as the loss function to optimize. Our 
model here however uses the probabilistic interpretation of matrix 
factorization (B and uses the sigmoid function with log-likelihood 
as it is a generalization of PC A to binary matrices H). 

Since many collaborative filtering applications often have aux¬ 
iliary information available for users and products/businesses, a 
number of approaches have studied how this information can be 
combined with matrix factorization for better rating prediction. If 
the auxiliary observation can be treated as fully-observed and noise- 
free, it can be used in conjunction with the neighborhood model to 
augment the matrix factorization objective (8). In practice, however, 
the auxiliary data is commonly noisy and incomplete, and thus has 
to be explicitly modeled for adequately leveraging it. McAuley et al. 
Ha combines matrix factorization with review text by modeling the 
words using a LDA topic model, and aligning the item/user latent 
vector with the review text topic vectors to learn better factors for 
rating prediction. Ling et al. ca similarly combine review text, but 
use mixture of Gaussian instead of matrix factorization, avoiding 
any transformation of the factors and thus retaining the interpretabil- 
ity of latent topics. Ganu et al. j?) predict ratings for restaurants 


from the review text alone, but require additional manually labeled 
data for classifying the sentiment and aspects of sentences. Ex¬ 
ternal information in the form of item taxonomies have also been 
investigated, for example, Weng et al. combine users’ prefer¬ 
ences with the item types to learn type-level preferences, while also 
addressing the cold-start problem for items with only taxonomic 
information, but do not employ any factorization model. Koenig- 
stein et al. 0 use global item biases in the Yahoo! Music dataset by 
using shared parameters amongst items with a common ancestor in 
the taxonomy hierarchy. Koren @ incorporates temporal dynamics 
into matrix factorization to learn changes in user movie preferences 
that occur over time, whereas, McAuley and Leskovec CD argue 
that, to enjoy certain kinds of products such as beer and gourmet 
foods, one requires a certain level of expertise, hence their model 
tries to combine temporal ratings data to make better personalized 
recommendations according to the experience of each user. Methods 
described above propose models that are specific to their domains, 
and thus the generalization capabilities of these models is unclear. 

An alternative approach is to combine all the data and represent 
it using tensors, allowing the use of tensor factorization, an exten¬ 
sion of matrix factorization to tensors. For example, the approach 
by De Lathauwer et al. O is used to predict tags for a user-item 
pairs I22II18I and to predict user ratings by integrating context infor¬ 
mation as a tensor (6) The main shortcoming of such approaches, 
however, is that they model only a single additional source of infor¬ 
mation, and further, focus on predicting only the relation of interest. 

To model multiple relations in a joint manner, collective matrix 
factorization (ID extend the idea of matrix factorization to multiple 
matrices. The rows and columns of the matrices have corresponding 
latent factors, with shared latent factors for entities that appear 
as rows or columns in multiple matrices. These approaches learn 
parameters for entities by jointly factorizing all of these matrices, 
and thus learn factors that predict multiple matrices. The empirical 
evaluation on relatively small databases with only two relations 
did not show considerable improvements; this is expected since 
collective factorization requires, and would benefit from, larger 
datasets. We use this model with the logistic/sigmoid formulation 
in this paper, combined with stochastic gradient descent (SGD) for 
optimization, and evaluate on 26 large-scale, multi-relation real- 
world datasets from Amazon and Yelp, combined. 

Our formulation of relational data, and the collective factorization 
model, can be easily extended. For example, the current formulation 
assumes at most a single relation exists between any specific pair of 
entities (since Pi^ is independent of relation r in Eq[TJ. Although this 
assumption holds for many applications, we can extend this model to 
multiple relations between the same pair of entities by introducing 
latent factors for the relations, similar to CP-decomposition (or 
PARAFAC) and recently proposed RESCAL 1151 . KrompaG et al. 
CD obtains highly compressed representations of large triple stores 
by using RESCAL to represent them as Probabilistic Databases 
(PDBs) and presents methods to efficiently answer complex queries 
on PDBs by breaking them into sub-queries. 

Our model also assumes binary absence/presence relations, how¬ 
ever non-Boolean binary relations can be modeled either by treating 
them as multiple binary relations, or by using a different function 
than the sigmoid to predict the value of the relation, while n-ary 
relations can be modeled as tensors with CP decomposition. It is 
worth mentioning that the Yelp dataset contains such deviations 
from our assumptions: the business attributes are discrete valued 
(Wi-Fi: Free, Paid, No) which is converted to multiple Boolean 
yes/no entities (Wi-Fi:Free, Wi-Fi:Paid, Wi-Fi:No), while the re¬ 
views in both Amazon and Yelp are 3-way relation between users, 
businesses, and words which we split into two binary relations. 


7. CONCLUSIONS AND FUTURE WORK 

In this paper, we presented the application of the collective rela¬ 
tional factorization model for improving user preference prediction. 
By learning entity embeddings that are shared between all the rela¬ 
tions the entity participates in, the model is able to combine multiple 
sources of evidence, predicts relations of multiple types, and fur¬ 
ther, allows computation of similarity between entities that do not 
share any direct relations. We presented empirical evaluation of user 
preference prediction that demonstrates that the collective model 
achieves higher accuracy with access to additional evidence. We 
also investigated cold-start evaluation for businesses, and showed 
that the collective model is accurate in predicting ratings (and at¬ 
tributes) even when none of the ratings (and attributes, respectively) 
of the business have been observed. We additionally explore joint 
visualization of categories, business attributes, and review words, fa¬ 
cilitated by the collective factors. The code for the algorithm, along 
with data processing and evaluation, is available for downloatj^ 

We would like to explore a number of avenues for future work. 
As we described in §|^ we will extend our collective factorization 
representation of relational data to support n-ary relations (by using 
tensor factorization) and to non-binary, multi-valued relations (for 
example, by introducing additional factors for relations). These 
extensions will enable us to support a wider variety of relations and 
databases; we will, for example, be able to model the complete Yelp 
schema, including attributes such as tips, locations, temporal infor¬ 
mation, and review tags, with a single collective factorization model. 
We will also investigate applications of this model on relational 
databases from other domains. 
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